30 research outputs found
Is attention all you need in medical image analysis? A review
Medical imaging is a key component in clinical diagnosis, treatment planning
and clinical trial design, accounting for almost 90% of all healthcare data.
CNNs achieved performance gains in medical image analysis (MIA) over the last
years. CNNs can efficiently model local pixel interactions and be trained on
small-scale MI data. The main disadvantage of typical CNN models is that they
ignore global pixel relationships within images, which limits their
generalisation ability to understand out-of-distribution data with different
'global' information. The recent progress of Artificial Intelligence gave rise
to Transformers, which can learn global relationships from data. However, full
Transformer models need to be trained on large-scale data and involve
tremendous computational complexity. Attention and Transformer compartments
(Transf/Attention) which can well maintain properties for modelling global
relationships, have been proposed as lighter alternatives of full Transformers.
Recently, there is an increasing trend to co-pollinate complementary
local-global properties from CNN and Transf/Attention architectures, which led
to a new era of hybrid models. The past years have witnessed substantial growth
in hybrid CNN-Transf/Attention models across diverse MIA problems. In this
systematic review, we survey existing hybrid CNN-Transf/Attention models,
review and unravel key architectural designs, analyse breakthroughs, and
evaluate current and future opportunities as well as challenges. We also
introduced a comprehensive analysis framework on generalisation opportunities
of scientific and clinical impact, based on which new data-driven domain
generalisation and adaptation methods can be stimulated
Factorised spatial representation learning: application in semi-supervised myocardial segmentation
The success and generalisation of deep learning algorithms heavily depend on
learning good feature representations. In medical imaging this entails
representing anatomical information, as well as properties related to the
specific imaging setting. Anatomical information is required to perform further
analysis, whereas imaging information is key to disentangle scanner variability
and potential artefacts. The ability to factorise these would allow for
training algorithms only on the relevant information according to the task. To
date, such factorisation has not been attempted. In this paper, we propose a
methodology of latent space factorisation relying on the cycle-consistency
principle. As an example application, we consider cardiac MR segmentation,
where we separate information related to the myocardium from other features
related to imaging and surrounding substructures. We demonstrate the proposed
method's utility in a semi-supervised setting: we use very few labelled images
together with many unlabelled images to train a myocardium segmentation neural
network. Specifically, we achieve comparable performance to fully supervised
networks using a fraction of labelled images in experiments on ACDC and a
dataset from Edinburgh Imaging Facility QMRI. Code will be made available at
https://github.com/agis85/spatial_factorisation.Comment: Accepted in MICCAI 201
You Don't Have to Be Perfect to Be Amazing: Unveil the Utility of Synthetic Images
Synthetic images generated from deep generative models have the potential to
address data scarcity and data privacy issues. The selection of synthesis
models is mostly based on image quality measurements, and most researchers
favor synthetic images that produce realistic images, i.e., images with good
fidelity scores, such as low Fr\'echet Inception Distance (FID) and high Peak
Signal-To-Noise Ratio (PSNR). However, the quality of synthetic images is not
limited to fidelity, and a wide spectrum of metrics should be evaluated to
comprehensively measure the quality of synthetic images. In addition, quality
metrics are not truthful predictors of the utility of synthetic images, and the
relations between these evaluation metrics are not yet clear. In this work, we
have established a comprehensive set of evaluators for synthetic images,
including fidelity, variety, privacy, and utility. By analyzing more than 100k
chest X-ray images and their synthetic copies, we have demonstrated that there
is an inevitable trade-off between synthetic image fidelity, variety, and
privacy. In addition, we have empirically demonstrated that the utility score
does not require images with both high fidelity and high variety. For intra-
and cross-task data augmentation, mode-collapsed images and low-fidelity images
can still demonstrate high utility. Finally, our experiments have also showed
that it is possible to produce images with both high utility and privacy, which
can provide a strong rationale for the use of deep generative models in
privacy-preserving applications. Our study can shore up comprehensive guidance
for the evaluation of synthetic images and elicit further developments for
utility-aware deep generative models in medical image synthesis.Comment: 10 pages, 4 figures, MICCAI Early Acceptanc
SaliencyGAN: Deep Learning Semisupervised Salient Object Detection in the Fog of IoT
In modern Internet of Things (IoT), visual analysis and predictions are often performed by deep learning models. Salient object detection (SOD) is a fundamental preprocessing for these applications. Executing SOD on the fog devices is a challenging task due to the diversity of data and fog devices. To adopt convolutional neural networks (CNN) on fog-cloud infrastructures for SOD-based applications, we introduce a semisupervised adversarial learning method in this article. The proposed model, named as SaliencyGAN, is empowered by a novel concatenated generative adversarial network (GAN) framework with partially shared parameters. The backbone CNN can be chosen flexibly based on the specific devices and applications. In the meanwhile, our method uses both the labeled and unlabeled data from different problem domains for training. Using multiple popular benchmark datasets, we compared state-of-the-art baseline methods to our SaliencyGAN obtained with 10-100% labeled training data. SaliencyGAN gained performance comparable to the supervised baselines when the percentage of labeled data reached 30%, and outperformed the weakly supervised and unsupervised baselines. Furthermore, our ablation study shows that SaliencyGAN were more robust to the common “mode missing” (or “mode collapse”) issue compared to the selected popular GAN models. The visualized ablation results have proved that SaliencyGAN learned a better estimation of data distributions. To the best of our knowledge, this is the first IoT-oriented semisupervised SOD method
Industrial Cyber-Physical Systems-based Cloud IoT Edge for Federated Heterogeneous Distillation.
Deep convoloutional networks have achieved remarkable performance in a wide range of vision-based tasks in modern internet of things (IoT). Due to privacy issue and transmission cost, mannually annotated data for training the deep learning models are usually stored in different sites with fog and edge devices of various computing capacity. It has been proved that knowledge distillation technique can effectively compress well trained neural networks into light-weight models suitable to particular devices. However, different fog and edge devices may perform different sub-tasks, and simplely performing model compression on powerful cloud servers failed to make use of the private data sotred at different sites. To overcome these obstacles, we propose an novel knowledge distillation method for object recognition in real-world IoT sencarios. Our method enables flexible bidirectional online training of heterogeneous models distributed datasets with a new ``brain storming'' mechanism and optimizable temperature parameters. In our comparison experiments, this heterogeneous brain storming method were compared to multiple state-of-the-art single-model compression methods, as well as the newest heterogeneous and homogeneous multi-teacher knowledge distillation methods. Our methods outperformed the state of the arts in both conventional and heterogeneous tasks. Further analysis of the ablation expxeriment results shows that introducing the trainable temperature parameters into the conventional knowledge distillation loss can effectively ease the learning process of student networks in different methods. To the best of our knowledge, this is the IoT-oriented method that allows asynchronous bidirectional heterogeneous knowledge distillation in deep networks
Disentangled representation learning in cardiac image analysis
Typically, a medical image offers spatial information on the anatomy (and pathology) modulated by imaging specific characteristics. Many imaging modalities including Magnetic Resonance Imaging (MRI) and Computed Tomography (CT) can be interpreted in this way. We can venture further and consider that a medical image naturally factors into some spatial factors depicting anatomy and factors that denote the imaging characteristics. Here, we explicitly learn this decomposed (disentangled) representation of imaging data, focusing in particular on cardiac images. We propose Spatial Decomposition Network (SDNet), which factorises 2D medical images into spatial anatomical factors and non-spatial modality factors. We demonstrate that this high-level representation is ideally suited for several medical image analysis tasks, such as semi-supervised segmentation, multi-task segmentation and regression, and image-to-image synthesis. Specifically, we show that our model can match the performance of fully supervised segmentation models, using only a fraction of the labelled images. Critically, we show that our factorised representation also benefits from supervision obtained either when we use auxiliary tasks to train the model in a multi-task setting (e.g. regressing to known cardiac indices), or when aggregating multimodal data from different sources (e.g. pooling together MRI and CT data). To explore the properties of the learned factorisation, we perform latent-space arithmetic and show that we can synthesise CT from MR and vice versa, by swapping the modality factors. We also demonstrate that the factor holding image specific information can be used to predict the input modality with high accuracy. Code will be made available at https://github.com/agis85/anatomy_modality_decomposition
Essex-NLIP at MediaEval Predicting MediaMemorability 2020 Task
In this paper, we present the methods of approach and the main results from the Essex NLIP Team’s participation in the MediEval 2020 Predicting Media Memorability task. The task requires participants to build systems that can predict short-term and long-term memorability scores on real-world video samples provided. The focus of our approach is on the use of colour-based visual features as well as the use of the video annotation meta-data. In addition, hyper-parameter tuning was explored. Besides the simplicity of the methodology, our approach achieves competitive results. We investigated the use of different visual features. We assessed the
performance of memorability scores through various regression models where Random Forest regression is our final model, to predict the memorability of videos
Manganese-enhanced Magnetic Resonance Imaging in Dilated Cardiomyopathy and Hypertrophic Cardiomyopathy.
Patients with dilated cardiomyopathy (n= 10) or hypertrophic cardiomyopathy (n= 17) underwent both gadoliniumand manganese contrast-enhanced magnetic resonance imaging and were compared with healthy volunteers(n= 20). Differential manganese uptake (Ki) was assessed using a two-compartment Patlak model. Compared withhealthy volunteers, reduction in T1 with manganese-enhanced magnetic resonance imaging was lower in patientswith dilated cardiomyopathy [mean reduction 257 ± 45 (21%) vs. 288 ± 34 (26%) ms,P< 0.001], with higher T1 at40 min (948 ± 57 vs. 834 ± 28 ms,P< 0.0001). In patients with hypertrophic cardiomyopathy, reductions in T1 wereless than healthy volunteers [mean reduction 251 ± 86 (18%) and 277 ± 34 (23%) vs. 288 ± 34 (26%) ms, with andwithout fibrosis respectively,P< 0.001]. Myocardial manganese uptake was modelled, rate of uptake was reducedin both dilated and hypertrophic cardiomyopathy in comparison with healthy volunteers (meanKi19 ± 4, 19 ± 3,and 23 ± 4 mL/100 g/min, respectively;P= 0.0068). In patients with dilated cardiomyopathy, manganese uptake ratecorrelated with left ventricular ejection fraction (r2= 0.61,P= 0.009). Rate of myocardial manganese uptake demon-strated stepwise reductions across healthy myocardium, hypertrophic cardiomyopathy without fibrosis and hyper-trophic cardiomyopathy with fibrosis providing absolute discrimination between the healthy myocardium andfibrosed myocardium (meanKi23 ± 4, 19 ± 3, and 13 ± 4 mL/100 g/min, respectively;P< 0.0001)
Multimodal cardiac segmentation using disentangled representation learning.
Magnetic Resonance (MR) protocols use several sequences to evaluate pathology and organ status. Yet, despite recent advances, the analysis of each sequence’s images (modality hereafter) is treated in isolation. We propose a method suitable for multimodal and multi-input learning and analysis, that disentangles anatomical and imaging factors, and combines anatomical content across the modalities to extract more accurate segmentation masks. Mis-registrations between the inputs are handled with a Spatial Transformer Network, which non-linearly aligns the (now intensity-invariant) anatomical factors. We demonstrate applications in Late Gadolinium Enhanced (LGE) and cine MRI segmentation. We show that multi-input outperforms single-input models, and that we can train a (semi-supervised) model with few (or no) annotations for one of the modalities. Code is available at https://github.com/agis85/multimodal_segmentation